Topic Detection in Read Documents
نویسندگان
چکیده
In this paper, we address the importance and the problems involved in topic annotation in the speech retrieval domain. Identified the problem, an algorithm developed to perform automatic topic annotation of broadcast news (BN) speech corpora is described. The approach adopted is based in Hidden Markov Models (HMM) and topic language models, to solve topic segmentation and labelling tasks simultaneously. To overcome the lack of topic labelled material to train the statistical models, a two-stage unsupervised clustering was developed. Both stages are based on the nearest-neighbour search method, using the Kullback-Leibler as a distance measure. On-going experiments to evaluate the system performance are also described.
منابع مشابه
A review of text mining approaches and their function in discovering and extracting a topic
Background and aim: Four text mining methods are examined and focused on understanding and identifying their properties and limitations in subject discovery. Methodology: The study is an analytical review of the literature of text mining and topic modeling. Findings: LSA could be used to classify specific and unique topics in documents that address only a single topic. The other three text min...
متن کاملMotivation to Read in a Second Language: A Review of Literature
Reading motivation is a well-researched topic in relation to first language literacy development due to its influence on both reading processes and outcomes. In second language reading, the role of motivation has not been as thoroughly explored. The aim of this review of literature is to highlight established studies as well as recent explorations in some recurring areas of first and second lan...
متن کاملDetection of Topic and its Extrinsic Evaluation Through Multi-Document Summarization
This paper presents a method for detecting words related to a topic (we call them topic words) over time in the stream of documents. Topic words are widely distributed in the stream of documents, and sometimes they frequently appear in the documents, and sometimes not. We propose a method to reinforce topic words with low frequencies by collecting documents from the corpus, and applied Latent D...
متن کاملClustering-Based Searching and Navigation in an Online News Source
The growing amount of online news posted on the WWW demands new algorithms that support topic detection, search, and navigation of news documents. This work presents an algorithm for topic detection that considers the temporal evolution of news and the structure of web documents. Then, it uses the results of the topic detection algorithm for searching and navigating in an online news source. An...
متن کاملیک مدل موضوعی احتمالاتی مبتنی بر روابط محلّی واژگان در پنجرههای همپوشان
A probabilistic topic model assumes that documents are generated through a process involving topics and then tries to reverse this process, given the documents and extract topics. A topic is usually assumed to be a distribution over words. LDA is one of the first and most popular topic models introduced so far. In the document generation process assumed by LDA, each document is a distribution o...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2000